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Abstract 

We propose in this paper a new propagation vector 
for malicious software by abusing the Tor network. 
Tor is particularly relevant, since operating a Tor exit 
node is easy and involves low costs compared to at- 
tack institutional or ISP networks. After presenting 
the Tor network from an attacker perspective, we de- 
scribe an automated exploitation malware which is 
operated on a Tor exit node targeting to infect web 
browsers. Our experiments show that the current de- 
ployed Tor network, provides a large amount of po- 
tential victims. 

1 Introduction 

The ubiquitous computation and network infrastruc- 
ture, currently deployed, is exposed to numerous 
risks. Recently the conficker worm, a self-spreading 
malicious software (malware) infected millions of ma- 
chines [9 j. Malware often uses multiple attack vec- 
tors. According to the authors of [9 the conficker 
worm can also propagate via network shares and USB 
sticks. Moreover some evil web pages infect visitors 
with malicious software [17] . Some users believe that 
Tor [6], an anonymous communication service, can 
help to mitigate against privacy and confidentiality 
attacks. Tor can be summarized as overlay network 
aiming to hide one's identity which is formally proved 



[7]. According to Eric Cronin [5] eavesdropping is a 
difficult task due to the fact that packets could be 
misinterpreted. An additional problem for attackers 
is to wiretap at a strategic point where multiple hosts 
can be sniffed. However, Tor simplifies traffic eaves- 
dropping for an attacker. An attacker simply needs 
to install Tor exit nodes and participate in the Tor 
network. Another advantage for an attacker to use 
Tor, is its proven anonymity which is tempting to 
create a stealthy and anonymous command and con- 
trol center for controlling the eavesdropping and the 
infection of machines. 

In this paper, we propose a novel propagation 
mechanism of malicious software via Tor and the con- 
tributions of this paper are 

• an estimation of the vulnerable browsers aiming 
to tune the web browser infection. 

• a mechanism to enforce interactions with the 
web browsers aiming to distribute malicious pay- 
loads. 

The remaining paper is organized as follows: 
Section [2] describes related work and focus on po- 
tential attacks on Tor. An attacker incentive model 
is presented in section [3] which motivates the de- 
sign and implementation of an automatic exploitation 
malware using the Tor network, shown in section [4j 
Section |U concludes the article and announces future 
work activities. 



2 Related work 

The main purpose of Tor is to provide anonymous 
communication services. This is achieved by set- 
ting up an overlay network composed of entry guard 
nodes, relay nodes and exit nodes. A client that 
wants to use the Tor network connects to entry guard 
and then establish a circuit towards the exit nodes. 
In this circuit each node only knows its predecessor 
[6]. Profiling attacks on encrypted web proxy traf- 
fic were already studied by analyzing the exchanged 
number of bytes [8]. McCoy et al. studied Tor traf- 
fic [10]. They captured traffic at entry guards and 
exit nodes. Thus, they were able to study some clear 
text protocols like HTTP and telnet. The purpose 
of their study was to gain some insights about the 
Tor usage. In their study they could establish the 
number of different users passing through their entry 
guard, because they could see where they are coming 
from. However, when analyzing traffic from an exit 
node the traffic is already anonymised which makes 
it hard to distinguish users. 

From that paper can be concluded that the most 
used protocol is HTTP. A threat model for the Tor 
network was proposed by [6 and [10 . An attacker 
can intercept some fraction of the traffic. She also can 
generate, modify, delay some traffic and can compro- 
mise a fraction of the Tor nodes. Roger Dingledine 
et al. described various attacks on the different Tor 
nodes [6] and McCoy et al. even present counter- 
measures to detect Tor exit nodes that are intercept- 
ing traffic [10]. Their major assumption is that the 
attacker is doing DNS reverse lookups in real time. 
Furthermore, efforts are done to wipe out sensitive 
information like user agents, cookies from HTTP re- 
quests. Privoxy [16] is a local proxy implementation 
that hides some sensitive information. An experi- 
ment, performed by Dan Egerstad [21 , showed that 
a lot of Tor users transmit sensitive information, like 
account names, user names and passwords through 
the Tor network without an end to end encryption. 
Security improvements in the Tor network are de- 
scribed by Mike Perry [15] and especially in the area 
of application attacks at the exit nodes. One of the 
proposed improvement is to carefully distribute Tor 
exit nodes usage to use disjoint IP networks. Mike 



Perry also announced to compute checksums of care- 
fully selected web pages in order to detect injection 
attacks. 

3 Attacker Incentive Model 

As discussed in section [2j attackers can easily eaves- 
drop traffic on a Tor exit node. In this paper we go a 
step further and propose an automated exploitation 
malware that is capable to infect browsers that pass 
through an exit node. An attacker should be able to 
estimate the population of vulnerable browsers and 
to enforce an interaction with the browsers. 

3.1 Passive attacks 

Besides the tools like Privoxy that try to wipe out 
most of this information some users still provide 
browser information like user agents and cookies. 
Many browsers set this string. Some browsers start 
this string by setting the browser's name followed 
with the version. Other browsers set the browser 
family first and then put the browser name between 
brackets. Furthermore, some browsers provide infor- 
mation about the underlying operating system and 
used libraries. This unorganized user agent naming 
provides us some insights about the users that are 
surfing via our exit node. 

Furthermore, the Mitre organization hosts the 
Common Vulnerabilities and Exposure Database 
(CVE) which contains known software vulnerabilities 
from 1999 until now, including browser vulnerabili- 
ties. If we observe n browsers, V of them are vulnera- 
ble and for V no vulnerability was reported. Thus we 
can compute the browser vulnerability ratio b defined 
in eq. [T] If all observed user agents are vulnerable 
the browser vulnerability ratio becomes 1, and if no 
observed user agents are vulnerable b = 0. 



3.2 Active attacks 

As previously described, the browser infection ratio 
can be computed. User agent strings can be forged. 
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Tools like Privoxy change user agent strings. More- 
over proxies or browsers can be configured to not 
download external objects attached to a web site, 
where an attacker can place his infection payload ded- 
icated for web browsers. Hence, a feedback from an 
observed HTTP traffic is desired. 

An attacker can tag HTML responses for getting 
this feedback. Practically, an attacker can set up a 
man-in-the middle attack by installing a transparent 
proxy on the Tor exit node. 

She can introduce n images or other objects in 
intercepted HTML documents. In case a regular 
browser is parsing these pages it tries to acquire the 
objects. Normally the URL of the object is first re- 
solved followed by the download of the object. 

We define a tag as an object that is injected in the 
intercepted HTML traffic and we propose two tags 
per HTML response. 

3.2.1 Static tag injection 

A static tag is a fix invisible image that is introduced 
in HTTP responses. The image has a dimension of 1 
to 1 pixel and is invisible aiming to not distract the 
user looking at the HTML page. The URL of the 
image is fix for all users. We assume that the DNS 
cache on the user's machine is working correctly and 
that the lookup of the image domain name is only 
done once while the user is surfing. Thus we can 
count the number of different users. 

3.2.2 Dynamic tag injection 

The purpose of the dynamic tag is to enforce an in- 
teraction for each visited web page. In case only a 
static image is used, the image is normally resolved 
once and then it is kept in cache for all the next 
web pages that are visited during the life time of 
the browser. In order to avoid this caching mech- 
anism an attacker can generate a unique sub domain 
for each injected dynamic image. Thus the machine 
hosting the browser is forced to do a DNS lookup. 
An attacker can also observe if a user comes back. 
In this case the user restarted her machine, reloaded 
her browser with a dedicated web page. In that web 
page, the attacker previously injected an image lo- 



cated on a unique sub domain. Hence, if the attacker 
sees more than one hit for a unique generated sub do- 
main, she can deduce that the same user reappeared. 

3.3 Attacker Information sources 

By intercepting and tagging HTML documents an 
attacker can explore three information sources. 

DNS server We assume that an attacker controls a 
DNS server for generating unique sub-domains 
for each dynamic tag. The attacker can log all 
the DNS queries including source IP addresses 
that do the DNS queries. 

Web proxy The tag injection can be done by doing 
a man-in-the middle attack. An attacker can 
compromise an exit node and set up a trans- 
parent proxy for inserting the tags. From this 
web proxy the attacker can record all the HTTP 
header information like user agents or cookies. 

TCP traffic After having compromised an exit 
node an attacker can also record all the out going 
traffic from the exit code. Thus she has access 
to the full communications of Tor users. The 
attacker can focus on HTTP responses, espe- 
cially on the mime type of a message, aiming to 
tune her browser infection. For instance, if she 
notices that most HTTP responses are HTML 
documents, she could inject images in the trans- 
ferred HTML documents. However, if she sees 
that the most transfered documents are PDF 
files she could launch PDF attacks. 

4 Torinj : An Automated Ex- 
ploitation Malware 

To validate the attacker incentive model we imple- 
mented a proof of concept malware called Torinj. 
Torinj is composed of three components : an un- 
modified Tor client, an embedded intercepting proxy 
and a hidden C&C (command and control) channel. 
An overview is shown in figure [I] A standard and 
unmodified Tor client is integrated with Torinj pro- 
viding the access to the Tor network layer. Torinj 
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Figure 1: An overview of the Torinj framework 

behaves like any other Tor client and provides sim- 
ilar services like relay or exit functionalities. Tor- 
inj includes a small HTTP proxy used to intercept 
and relay HTTP requests. Interception and relaying 
are activated by the attacker using the C&C chan- 
nel. The hidden C&C channel relies on hidden ser- 
vice protocol [20 available in Tor to provide some 
anonymity [13] to the command and control interface 
and its user. The attacker access the C&C channel of 
each Torinj bot through the Tor network. Torinj in- 
fection is working at the interception level and does 
not need to lure the users to connect to attractive 
services. Torinj infection is done on the unencrypted 
HTTP requests/responses crossing the infected exit 
node. The exploitation mechanism of Torinj is com- 
posed of two steps: 

passive attacks, where the Torinj HTTP proxy is 
gathering essential information about the HTTP 
requests (e.g. browser user agent, Internet media 
type) without altering the requests; 

active attacks, where Torinj is exploiting the 
HTTP requests by modifying the responses 
based on the optimal infection scenario learned 
by the previous step. 

For further technical details we recommend to read 
our source cod^U 

4.1 Experiment setup 

During this experiment we used three different ma- 
chines. On the first machine (Mi) we operated an 

1 http : //www . f oo . be/torinj/] 



unmodified Tor exit node (v0.2.1.14-rc). On the sec- 
ond machine we let run BIND [2 j, version 9.4.2, as 
DNS server and used the tool tcpdump [19] to cap- 
ture all the DNS queries and responses. On the third 
machine (Ms) we operated an apache web server [I], 
version 2.2.6, hosting the transparent image simu- 
lating a malicious payload. From a legal and ethi- 
cal point of view we avoided to inject malicious java 
script pay loads like XSS-proxy or BeEF [3]. 

All the machines were synchronized with NTP [IT] 
in order to have accurate timestamps. After having 
started to participate in the Tor network, we set up a 
web proxy implemented in Perl from CPAN [4 (0.23). 
This proxy was extended to inject tags with regular 
expressions. We used the tool iptables [12] to reroute 
the traffic, originated from the Tor exit node to the 
Internet, to our Perl proxy server. The DNS server 
was configured with a wild card that it should asso- 
ciate all sub-domains with the IP address of our web 
server. Thus inside the web proxy we can generate 
dynamic and static tags that always point to our web 
server. As information sources we used tcpdump ac- 
tivated on Mi and M2, the web server logs, the web 
proxy logs. The processing was done with Perl and 
sqlite3 [14] and a modified version of tcpick [18]. 

4.2 Passive attacks 

We operated a Tor exit node for a period of 28 hours 
and we passively inspected observed HTTP headers. 
In this experiments we observed similar results to Mc- 
Coy et al [10 . We observed that 96% of the traffic 
was HTTP and only 4% of the traffic was end-to-end 
encrypted with HTTPS. 

We have also discovered 4973 different user agent 
strings which confirms the non existence of naming 
convention for user agents. We have found that only 
3.2% of the HTTP requests did not have a user agent 
set. We assume that these browsers are not vul- 
nerable despite they could be vulnerable versions. 
Moreover we did an automatic lookup of the user 
agent in the CVE list. Although 1845 user agents 
did not match an entry in the CVE list (37% of 
the browsers), there may be undisclosed vulnerabil- 
ities. If a version is not explicitly set for a given 
browser, we assume that this browser is not vul- 
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Figure 2: Vulnerable and non vulnerable user agents 

nerable (1.3% of the observed browsers). We found 
3106 vulnerable unique user agents. Figure [2] con- 
firms the fact that there are more vulnerable browsers 
than non vulnerable browsers. We measured during 
time intervals of 15 minutes the number of vulnerable 
browsers and non vulnerable browsers. The number 
of unique user agent strings is growing (figure [2]) be- 
cause most user agent strings contain version num- 
bers, with which other browsers they are compatible, 
information about the underlying operating system, 
patch levels and used libraries. Figure [3] presents the 
browser vulnerability ratio, which is varying around 
0.63. That means that on average 63 browsers of 100 
are vulnerable according the CVE list which shows 
the potential of our automated exploitation malware. 

4.3 Active attacks 

For this experiment we set up and operated the au- 
tomated exploitation malware proof of concept for 
two and a half hours. We have observed 391 differ- 
ent user agents that passed through our proxy. The 
proxy injected 126 static tags and 688 dynamic tags. 

The purpose of the static tag is to count the dif- 
ferent users. If a user opens her browser, the later 
resolves the static tag, the static tag is then kept in 
the user's cache and it should not be resolved again. 
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Figure 3: Browser vulnerability ratio 



On our DNS server we observed 126 hits for the static 
image and on our web server we counted 196 hits. 
Most of the hits on the web server were done through 
our proxy. However 80 of the web server hits passed 
trough other Tor exit nodes. When the injected tags 
are downloaded through our proxy, our proxy did not 
tag these responses again. We also have counted 391 
different user agent strings. This number is higher 
than the number of static tag injection hits and the 
ratio corresponds to 32% which can be explained that 
only HTML documents were tagged and other mime 
types were directly forwarded without change. 

Each HTML document having a body element is 
intercepted and a unique dynamic tag is injected and 
our proxy injected 688 tags. 



4.3.1 Mime type distribution 

In order to get a feedback from a user the injected tag 
needs to be processed by the user agent and the user 
agent needs to connect back to the attacker. This is 
often the case when HTML documents are processed. 
Table [I] shows the mime type distribution. Roughly 
a third of the traffic that goes through the proxy is 
composed of HTML documents. 
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Mime type % 

text /html 33 

image/jpeg 24 

image/gif 16 

image/png 06 

text /plain 05 
Content-Type : application / x-j avascript 04 

Content-Type:text/css 03 

Content-Typeitext/javascript 03 

Content-Type:text/xml 02 

Others: 06 



Table 1: Mime type distribution 



5 Conclusion and future work 

In this paper, we have described the incentive for 
an attacker to compromise Tor exit nodes and de- 
signed the Torinj scenario targetting the HTTP pro- 
tocol. The experiments further demonstrate the via- 
bility of the Torinj prototype and the inherent inter- 
est for an attacker to compromise Tor exit nodes. Our 
experiments showed that 63% of the browser pass- 
ing through an exit node are vulnerable according 
the CVE database. Moreover, we showed that inter- 
action with the browsers can be induced by inject- 
ing tags in HTML documents. By injecting tags in 
HTML documents an interaction per web-page can 
be enforced which is necessary of malicious payload 
distribution. However, additional research efforts are 
needed to complete this proof of concept. First of all, 
the automated exploitation malware should be oper- 
ated over a longer period of time and from different 
IP addresses. We already facilitated this work by 
making our exploitation software freely available un- 
der a GPL license. Furthermore, user agents can be 
carefully crafted to trick the exploitation malware. 
Therefore other browser finger printing techniques 
should be explored. We have only tested the injec- 
tion in HTML documents and other mime types, like 
PDF, images, movies can be explored. We are also 
planning to improve the infection model to find effec- 
tive strategies for the attacker to launch automatic 
infection while limiting the detection factor. 
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